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ABSTRACT 


Large,  complex  computer  simulation  models  can  require 
prohibitively  costly  and  time-consuming  experimental  programs  to 
3tudy  their  behavior.  Therefore  we  may  want  to  concentrate  the 
analysis  on  the  set  of  "most  important"  factors  (i.e.,  input 
variables).  Factor  screening  experiments,  which  attempt  to 
identify  the  more  important  variables,  can  be  extremely  useful  in 
the  study  of  such  models.  The  number  of  computer  runs  available 
for  screening,  however,  is  usually  severely  limited.  In  fact, 
the  number  of  factors  often  exceeds  the  number  of  available  runs. 
In  thi3  paper  we  present  a  survey  of  supersaturated  designs  for 
use  in  factor  screening  experiments.  The  designs  considered  are: 
random  balance,  systematic  supersaturated,  group  screening, 
modified  group  screening,  T-optimal,  R-optimal,  and  search 
designs.  We  discuss  in  general  terms  the  basic  technique, 
advantages,  and  disadvantages  of  each  procedure  surveyed. 


1 .  INTRODUCTIOH 

Large-seal •>  computer  simulation  models,  because  of  their 
size  and  running  time,  can  require  prohibitively  costly  and 
time-consumir.g  experimental  programs  to  study  their  behavior. 
Often  it  is  anticipated,  however,  that  oniy  relatively  few 
factors  (i.e.,  input  variables)  will  have  major  effects. 
Therefore,  one  may  want  to  conduct  an  efficient  preliminary 
experiment  to  determine  the  spbset  of  "most  important*  factors. 
Once  the  most  important  factors  have  been  identified,  subsequent 
experimentation  can  focus  on  these  particular  factors,  thereby 
eliminating  experimentation!  with  relatively  unimportant  factors 
which  can  needlessly  consume  resources. 

In  some  situations,  of  aourse,  prior  information  concerning 
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the  experimental  factors  is  available  and  can  be  used  to  identify 
those  factors  which  are  most  likely  to  be  of  importance. 

Although  the  use  of  prior  information  can  be  beneficial  to  the 
screening  process,  in  this  paper  we  assume  that  there  is  no  prior 
information  available.  This  would  not  exclude,  for  example, 
situations  in  which  prior  information  is  available  on  the 
importance  of  some  of  the  factors,  and  it  is  desired  to  examine 
the  remaining  factors  in  a  screening  experiment. 

The  function  of  a  factor  screening  experiment  is  to  sort  the 
factors  into  two  groups.  One  group  consists  of  the  important 
factors  which  are  judged  worthwhile  to  investigate  further,  while 
the  other  consists  of  the  remaining  factors  classified  as 
unimportant.  In  general,  in  screening  experiments  we  want  (a)  to 
detect  as  many  important  factors  as  possible,  (b)  to  declare 
important  as  few  unimportant  factors  as  possible,  and  (c)  to 
expend  as  few  computer  runs  as  possible.  Since  these  are 
conflicting  objectives,  one  must  generally  trade  off  how  many 
runs  a  screening  method  requires  against  how  accurately  it 
classifies  factors. 

The  screening  problem  can  occur  in  two  general  situations. 
These  are  the  unsaturated/saturated  and  the  supersaturated 
situations.  In  the  unsaturated/saturated  situation,  one. can 
afford  to  invest  more  runs  than  there  are  factors.'  In  the 
supersaturated  situation,  the  number  of  factors  equals  or  exceeds 
the  number  of  runs  available  for  screening.  Although  screening 
can  be  done  more  effectively  in  the  unsaturated/saturated 
situation,  the  supersaturated  situation  is  a  common  and  practical 
situation  in  the  analysis  of  large-scale  simulation  models. 

Here,  once  again,  design  economy  is  the  primary  consideration. 

Supersaturated  design  procedures  are  not  customarily 
discussed  in  textbooks  on  experimental  design,  and  there  are  few 


examples  of  such  experiments  in  the  statistical  or  simulation 
literature.  This  paper  presents  a  survey  of  supersaturated 
designs  for  use  in  factor  screening,  with  application  to 
large-scale  simulation  models.  The  designs  considered  ares  (i) 
random  balance  designs,  (ii)  systematic  supersaturated  designs, 
(iii)  group  screening  designs,  (iv)  modified  group  screening 
designs,  (v)  T-optimal  designs,  (vi )  R-optimal  designs,  and  (vii) 
search  designs.  Our  intent  is.  to  provide  a  broad  overview  of 
supersaturated  screening  methods  and  to  discuss  in  general  terms 
the  basic  technique,  analysis  procedures,  advantages,  and 
disadvantages  of  each  method  surveyed.  Appropriate  references 
are  provided  if  further  information  ia  desired. 

2.  A  SCREENING  MODEL 

For  the  purpose  of  detecting  the  factors  which  have  major 
effects,  it  generally  suffices  to  assume  the  first-order  model 

K 

yi  *  «o  ♦  E  8JXlj  +Ci 

j.i 

where  y^  is  the  value  of  the  response  (i.e. ,  output  variable)  in 
the  ith  simulation  run,  K  is  the  total  number  of  factors,  each  of 
which  is  at.  two  normalized  levels  (coded  +1),  x^j  is.  the  level  of 
the  Jth  factor  during  the  ith  simulation  run,  B0  is  a  constant 
component  common  to  all  observations,  Bj  (j>1)  is  the  (linear) 

effect  of  the  jth  factor,  and  e.  is  a  random  error  component  with 

2  * 

mean  0  and  unknown  variance  a  . 

A  common  interpretation  of  this  model  is  that  it  represents 
a  first-order  Taylor  series  approximation  to  the  true 
relationship  between  the  output  y^  and  the  normalized  input 


variables  x^,  x2*  . ..,  x^.  Moreover,  the  coefficients  B^,  82, 

6k  can  be  related  to  the  sensitivity  of  the  output  variable 
to  changes  made  in  the  input  variables,  at  least  in  the 
vicinity  of  their  nominal  values.  Ordinarily  this  model  would  be 
used  over  a  relatively  small  region  of  the  factor  space. 

3.  SUPERSATURATED  SCREENING  DESIGNS 

Screening  designs  can  be  classified  as  either  "fixed"  or 
"sequential"  designs.  In  a  fixed,  or  nonsequential,  design,  the 
factors  are  screened  based  on  a  given  set  of  observations  (i.e., 
computer  runs).  In  a  sequential  design,  results  from  a 
first-stage  design  are  used  to  provide  information  on  how  to  set 
up  the  design  used  in  the  next  stage,  and  so  on.  All  of  the 
designs  considered  in  this  paper  are  fixed,  with  the  exception  of 
group  screening  designs  which  are  sequential. 

3*1  Random  Balance  Designs 

Random  balance  (RB)  designs,  introduced  by  Budne  (1959a, 
1959b)  and  Satterthwaite  <  1 959 ) »  were  discussed  at  length  by 
Anscombe  (1959)  and  Youden  et  al.  (1959).  See  also  Dempster 
(I960),  Mauro  and  Smith  ( 1984) ,, and  Mauro  and  Burns  ( 1 98 4 ) . 

In  a  two-level  (+1)  RB  design,  each  column  of  the  design 
matrix  consists  of  N/2  ♦1,s  and  N/2  -1*s  where  N,  an  even  number, 
denotes  the  tctal  number  of  .‘uns  to  be  made.  The  +1's  and  -I's 
in  each  column  are  assigned  randomly,  making  all  possible 
combinations  of  N/2  +1's  and  N/2  -1*s  equally  likely,  with  each 
column  receiving  an  Independent  randomization. 

The  principal  advantage  of  the  RB  method  is  its  flexibility. 
The  number  of  runs  N  can  be  selected  independently  of  the  number 
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of  factors  K.  No  mathematical  restriction  or  relationship, 
except  that  N  be  an  even  number,  need  exist  between  N  and  K»  A 
second  advantage  is  that  RB  designs  are  very  easy  to  prepare  for 
any  combination  of  N  and  K.  This  latter  advantage  can  be  an 
important  consideration  when  K  is  large. 

The  major  disadvantage  of  RB  designs  is  that  confounding  is 
random.  Anscombe  (1959)  has  written: 

The  fact  that  the  degree  of  nonorthogonality  or  unbalance  is 
random  can  be  made  the  basis  for  an  objection  to  the  whole 
notion  of  random  balance  designs.  Such  designs  may  work 
well  on  the  average,  but  3hould  X  trust  to  one  on  this 
occasion? 

Indeed,  the  lack  of  control  over  the  confounding  in  RB  designs 
ha3  been  a  controversial  aspect  since  such  experimentation  was 
first  proposed. 

Another  disadvantage  of  RB  designs  is  that  there  is  no 
generally  accepted  or  established  method  of  analysis  for  these 
designs.  In  fact,  the  problem  of  analysis  is  characteristic  of 
supersaturated  designs  with  irregular  confounding  patterns  and  is 
not  peculiar  to  the  RB  method.  The  simplest  analysis  approach  is 
to  consider  each  factor  separately,  ignoring  all  other  factors, 
and  apply  some  standard  analysis  technique  such  as  an  F-test. 

More  sophisticated  analysis  methods  which  oan.  be  used  include 
variable  selection  procedures  such  as  least  squares  stepwise  and 
stagewise  regression  (see,  for  example,  Draper  and  Smith  1981). 

The  simple  least  squares  estimator  of  obtained  by 
ignoring  all  other  factors  is  given  by 

sj  *  (9*j  -  9-j,/2 

where  y+j(y_j)  is  the  average  value  of  the  response  over  the  N/2 
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runs  at  the  +1(-1)  level  of  the  jth  facto*.  Let  £  denote  the  Nxl 
vector  (y1,y2,...,yN)‘  of  responses  and  Xj  denote  the  Nxl  vector 
(x^  j»x2j»  •  •  •  »xNj) ' .  Further,  let  X=M  ,x.j  ,x_2»  •  •  •  where  1_ 

designates  an  Nxl  vector  of  +1’s.  In  an  RB  design,  the  matrix  X 
is,  by  construction,  stochastic  (except,  of  course,  for  the 
initial  column  of  +1's).  Assuming  that  X  and  the  e^,  are 
independent,  it  is  easily  shown  that  conditional  on  X, 

E(8j|X)  =  Bj  +  '(  Z  B^XjJ/M'  (1) 

and 

V(6j|X)  =  o2/N.  (2) 

The  conditional  mean  square  error  (MSE)  of  ^  is  then 

MSECBjIX)  =  o 2  /N  *  (  Z  B1x’iiij)2/N2  • 

U  5 

Unconditionally,  it  can  be  shown  that 

ECS J )  a  Bj  (3) 

and 

V(0j)  a  tJ/CH-1  )  ♦  o2/ K  (4) 

where  t2.  a  Z  B2* 
j  m 

As  Box  (1959)  pointed  out,  equations  (1)  and  (2)  refer  to 
the  behavior  of  the  estimates  for  repetitions  of  a  particular  RB 
design.  Equations  (3)  and  (4),  on  the  other  hand,  refer  to  the 


behavior  of  the  estimates  if  we  average  over  the  random  choice  of 
RB  designs.  Box  noted  that  although  8j  is  unconditionally 
unbiased,  the  effect  of  the  conditional  bias  term  in  (1)  is 
transferred  to  the  unconditional  variance  of  6j  which  now 
contains  terras  from  every  other  factor  present.  Conditionally , 
therefore,  one  pays  the  price  of  having  a  biased  estimator; 
unconditionally,  one  pays  the  price  of  having  a  potentially 
inflated  variance.  From  either  point  of  view,  RB  sampling  would 
seem  inefficient  for  detecting  all  but  the  very  large  effects. 

In  a  study  of  the  RB  method,  Mauro  and  Burns  (1984)  derived 
formulas  for  determining  (unconditional)  power  probabilities  of 
detecting  factor  effects  when  separate  F-tests  are  used  as  the 
method  of  analysis. 

Some  further  results  regarding  the  use  of  separate  F-tests 
are  that  for  ijij 

cov(8i,8j)  s  8i8j/(N-1) 

and  corr(81,Bj)  *  BiBj/fT|+o2)(xj+o2).  (o) 

The  correlation  expressed  in  equation  (5)  is  a  measure  of  the 
confounding  between  and  8j.  It  is  interesting  to  note  that  an 
increase  in  N  doeu  not  reduce  the  confounding  in  an  RB  design 
where  simple  least  squares  is  used  as  the  estimation  method.  As 

Indicated  by  (5),  the  confounding  between  §.  and  0.  is  primarily 

2  1  J 

a  function  of  o  and  the  magnitudes  of  the  effects  in  the  model. 

It  should  also  be  noted  that  more  sophisticated  analysis 
techniques  such  as  variable  selection  procedures  are  not  immune 
to  the  adverse  effects  of  the  Irregular  confounding  patterns 
characteristic  of  supersaturated  designs.  It  is  well  known  that 
multlcollinearity  in  the  predictor  variables  can  cause  serious 
computational  and  statistical  difficulties.  See,  for  example, 


Belsley,  Kuh.,  and  Welsch  (1930)  and  Si  Ivey*  (1969).  Furthermore, 
sequential  selection  procedures  have  the  added  problem  that  it  is 
difficult  to  control  the  true  overall  significance  level  (i.e., 
the  probability  of  declaring  important  a  negligible  factor)  of 
such  procedures. 


3.2  Systematic  Supersaturated  Designs 


Because  of  the  random  confounding  that  occurs  in  RB  designs, 
Booth  and  Cox  (1962)  introduced  balanced  (j.e.,  an  equal  number 
of  +1*s  ana  -I's  in  each  design  column)  two-level  designs  which 
systematically  attempt  to  minimize  confounding.  Since  not  all 


design  columns  can  be  orthogonal  when  N  C  K,  Booth  and  Cox 
constructed  designs,  which  they  termed  systematic  supersaturated 


(SS)  designs,  „hat  minimize  raax^j  lcij 


where  c^ax^x^. 


For  two 


or  more  designs  with  the  same  miniraax  value,  the  preferred  SS 


design  is  the  one  which  minimizes  the  number  of  pairs  of  columns 


attaining  the  minimax  value. 


The  cosine  of  the  angle  6^,  jCtt,  between  any  two  column 
vectors,  x^  and  x^,  is  defined  by 

cos  9iJ  =  cij/(cucjj)1'2  «  Cjj/N. 

The  vectors  xi  and  x;.  are  orthogonal  if  c^sO  or»  in  other  words, 
if  the  cosine  of  the  angle  between  them  is  zero.  The  absolute 
value  of  the  inner  product,  therefore,  is  a  measure  of  the 
orthogonality  of  any  two  design  columns.  In  a  certain  sense, 
then,  SS  designs  are  constructed  as  nearly  orthogonal  as 
possible. 


Bcoth  and  Cox  tabulated  their  SS  designs  for  the  following 
seven  combinations  of  (N,K) :  (12,16),  (12,20),  (12,24),  (18,24), 
(1.8,30),  (18,36),  and  (24,30).  As  they  pointed  out,  designs  for 
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intermediate  values  of  K  can  be  formed  by  dropping  the  final 
columns  from  the  next  largest  SS  design.  The  designs  indicated 
above  were  obtained  with  the  aid  of  an  iterative  computer  search 
procedure,  since  it  was  impracticable  to  enumerate  all  possible 
designs  and  select  the  best. 

The  principal  advantage  of  SS  designs  is  that  they  attempt 
to  minimize  the  confounding  which  inevitably  occurs  when  N  £  K. 
The,  principal  disadvantage  is  that  these  designs  are  not  readily 
available  for  combinations  of  N  and  K  other  than  those  already 
tabulated.  Furthermore,  the  time  and  expense  needed  to  write  and 
to  run  a  computer  program  for  the  generation  of  these  designs  nay 
be  prohibitive,  especially  if  K  is  large.  AI30,  a3  in  RB 
designs,  there  remains  the  difficulty  that  the  analysis  of  SS 
designs  i3  complicated  by  the  confounding  of  factor  effects. 

A  quick  comparison  of  SS  with  RB  designs  can  be  made  based 

on  an  analysis  of  the  variance  of  the  inner  product  of  two 

columns  chosen  at  random  from  the  design.  For  RB  designs,  for 

2 

example,  the  variance  of  x|x.j  for  any  i  and  j  is  N  /(N-1).  Booth 
and  Cox  made  such  a  comparison  for  the  seven  SS  designs  they 
derived  and  observed  that  SS  designs  are  substantially  better 
than  RB  designs  when  N  >  K/2.  As  would  be  expected,  SS  designs 
lose  their  advantage  when  N  is  small  relative  to  K. 

3*3  Group  Screening  Designs 

In  a  group  screening  <GS)  design  we  partition  the  individual 
factors  into  groups  of  suitable  sizes  and  then  test  these  groups 
by  considering  each  as  a  single  factor.  The  level  of  a 
"group -factor"  is  defined  by  assigning  the  same  level,  either  +1 
or  -1,  to  each  component  faotor.  Because  the  number  of 
group-factors  is  generally  much  smaller  than  the  original  number 
of  fact  m,  we  can  usually  study  the  group-factors  in  a  standard 


orthogonal  design  such  as  a  Plaekett-Burman  (PB)  design.  PB 
designs  are  two-level  (+1)  orthogonal  designs  i or  studying  up  to 
K=4m-1  factors  in  N=4m  runs.  PB  designs  were  tabulated  by 
Plackett  and  Burman  (1946)  for  IKIOO  and  are  minimal  in  the 
number  of  runs  required  to  achieve  orthogonality.  When  N  is  a 

k-D 

power  of  two,  PB  designs  are  the  same  as  resolution  III  2  v 
fractional  factorial  designs  (see  Box  and  Hunter  1961). 

The  grouping  and  testing  process  can  b-  .epeated  for  any 
number  of  stages.  In  each  stage,  however,  we  repartition  into 
smaller  groups  only  those  groups  determined  to  be  significant  in 
the  previous  stage.  Further,  we  hold  at  a  constant  level  any 
factor  not  included  in  a  subsequent  stage  so  not  to  bias  any  of 
the  later-3tage  group-factor  estimates.  In  the  final  stage  of 
screening,  we  test  factors  individually  (i.e.,  the  group  size  is 
unity). 

GS  designs  were  introduced  initially  by  Watson  (1961),  who 
considered  screening  in  two  stages.  The  GS  method  was  then 
generalized  to  iuO**e  than  two  stages  by  both  Li  (1962)  and  Patel 
(1962).  For  an  excellent  overview  of  GS  designs,  the  reader  may 
consult  Kleijnen  (1975). 

There  are  two  major  advantages  to  GS  designs.  The  first  of 
these  i3  that  we  can  to  a  certain  extent  control  the  confounding 
pattern,  since  factors  within  a  group  are  completely  confounded 
and  factors  in  different  groups  are  not  confounded..  '  Secondly, 
the  grouping  process  reduces  the  dimensionality  of  the  model  and 
enables  the  use  of  orthogonal  main  effect  designs,  such  as  PB 
designs,  to  test  the  significance  of  group-factors.  Moreover, 
such  designs  can  be  analyzed  by  the  usual  analysis  pf  variance 
procedures  for  factorial  experiments. 

There  are  two  major  disadvantages  of  the  GS  method.  The 


first  of  these  is  that  the  total  number  of  runs  required  by  a  GS 
procedure  is  not  fixed,  since  the  number  of  group-factors  carried 
over  from  stage  to  stage  (when  one  progresses  beyond  the  first 
stage)  is  random.  Thus,  in  a  GS  strategy,  one  generally  does  not 
know  prior  to  experimentation  the  exact  total  number  of  runs  that 
will  be  expended.  A  second  major  disadvantage  is  that  the 
optimal  choice  of  group  sises  and  significance  levels  used  in  the 
various  stages  of  group  screening  requires  prior  information  on 
certain  properties  of  the  underlying  model,  such  as  the 
proportion  of  important  effects.  The  problem  of  selecting 
optimal  two-stage  group  screening  designs  has  been  discussed  by 
Mauro  (1984)  and  Patel  and  Ottieno  (1984). 

Another  consideration  in  the  use  of  the  GS  approach  is  that 
important  effects  may  cancel  within  a  group.  As  a  simple  , 
example,  consider  two  factors  which  have  effects  that  are 
negatives  or  near  negatives  of  each  other.  If  these  two  factors 
are  the  only  important  factors  ih  a  group,  their  effects  will 
essentially  cancel  and  their  combined  effect  may  be  rua3kec  by 
experimental  error.,  Cancellation  of  effects  cannot  occur,  of 
course,  if  factor  levels  are  assigned  a  priori  so  that  all 
effects  are  in  the  same  direction.  See  Mauro  (1984),  Mauro  and 
Burns  (1984),  and  Mauro  and  Smith  (1982)  for  a  more  detailed 
analysis  of  the  effects  of  cancellation  on  the  performance  of 
two-stage  GS  designs. 

3.4  Modified  Group  Screening  Designs  , 

Because  an  analyst  might  be  reluctant  to  use  a  screening 
strategy  in  which  the  total  number  of  runs  cannot  be 
predetermined,  Mauro  and  Burns  (1984)  suggested  a  modified  GS 
procedure  in  two  stages  where  the  total  number  of  runs  can  be 
fixed  prior  to  experimentation.  Consider  a  two-stage  GS  strategy 
where  the  analyst  decides  beforehand  not  only  on  the  number  of 


group-factors  which  will  be  tested  in  the  first  stage  but  also  on 
the  number  of  group-factors,  say  ms  which  will  be  carried  over  to 
the  second  stage.  After  the  first-stage  experiment,  the  m 
group-factors  with  the  largest  estimated  effects  are  determined 
and  their  component  factors  tested  individually  in  a  second-stage 
experiment.  In  this  strategy,  the  number  of  first-  and 
second-stage  runs  are  both  predetermined. 

The  advantages  of  modified  GS  designs  are  the  same  as  for 
regular  GS  designs.  The  same  disadvantages  also  apply  except  of 
course  that  the  total  number  of  runs  is  fixed  in  modified  GS  and 
Is  random  in  regular  GS.  Ah  additional  disadvantage  of  modified 
GS  is  that  the  prespecification  of  m,  the  number  of  group-factors 
to  be  carried  over  to  the  second  stage,  may  be  inadequate  in 
order  to  reasonably  insure  that  all  of  the  apparently  significant 
groups  reach  the  second  stage. 

Further  research  and  practical  experience  on  modified  GS 
designs  i3  needed.  However,  preliminary  indications  are  that  the 
performance  of  tniS  strategy  is  comparable  to  that  of  regular 
two-stage  GS  if  the  proportion  of  important  effects  13  not  too 
large. 

3.5  T-Optimal  Designs 

In  a  two-level  (+1)  design,  the  inner  product  between  any 
two  design  columns  is  a  measure  of  their  orthogonality.  ,SS 
designs,  which  were  discussed  in  Section  3.2,  are  constructed 
with  the  objective  of  minimizing  the  maximum  absolute  inner 
product  between  any  two  distinct  design  columns.  One  can  define 
other  criteria,  however,  for  measuring  the  optimality  of  a 
supersaturated  design. 

We  define  a  design  to  be  T-optimal  in  a  given  class  of 
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designs  if  it  minimizes,  over  all  designs  in  that  class,  the 
2 

trace  of  (X'X)  ,  where  X  is  as  defined  previously.  Equivalently, 
a  design  is  T-optimal  if  it  minimizes  the  sum  of  squared  inner 
products  of  all  pairs  of  columns  in  X.  Thus,  the  columns  of  X 
are,  in  a  certain  sense,  as  nearly  orthogonal  as  possible. 

The  principal  disadvantage  of  supersaturated  T-optimal 
designs  is  that  rules  for  their  general  construction  have  not 
been  developed,  nor  have  any  such  designs  been  tabulated  within 
the  class  of  two-level  (+1)  designs.  However,  in  studies  where 
the  constant  term  is  known  or  an  advance  estimate  is 
available,  rules  for  the  construction  of  supersaturated  T-optimal 
designs  can  essentially  be  found  in  Morris  and  Mitchell  (1983), 
who  derived  such  designs  in  the  process  of  obtaining  their 
trace-L  optimal  designs  for  detecting  two-factor  interactions.  A 
second  disadvantage  is  that,  as  in  RB  and  SS  designs,  the 
analysis  of  T-optimal  designs  is  made  difficult  by  the 
confounding  of  factor  effects. 

3.6  R-Optimal  Designs 

In  matrix  terms,  the  screening  model  introduced  in  Section  2 
can  be  expressed  as 

L  =  M+£ 

where  £  is  an  Nxl  vector  of  responses,  jJs(B0,Bj  .  ,BK)  is  a 

(K+1)x1  vector  of  parameters,  e  is  an  Nxl  vector  of  random  error 

2 

terms  with  mean  0  and  variance  o  ,  and  X  is  an  Mx<K+1)  matrix  of 
coefficients  of  the  parameters  B 0,{J  ^ .  ,BR.  We  shall  assume 
that  N  £  K  and  that  X  is  of  rank  N.  For  simplicity  in  the 
following  discussion,  we  shall  also  assume  that  o2s0,  so  that 
£=X«. 
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Consider,  then,  the  supersaturated  system  of  linear 
equations  y=XjS.  Since  N  £  K,  this  system  is  underdetermined  and 
therefore  possesses  infinitely  many  solutions.  It  can  be  shown, 
however,  that  the  solution  which  has  minimum  length  is  given  by 

In  =  x'cxx'rY  .  (6) 

/s 

Because  0^  is  the  minimum  length  solution,  it  insures,  in  some 
sense,  that  factors  are  treated  equally  in  the  estimation 
process.  For  example,  if  two  or  more  factors  are  completely 
confounded,  the  effect  of  each  factor  will  be  estimated  by  the 
average  of  their  combined  effects. 


Substituting  y  =XB  into  (6),  we  obtain,  in  terms  of  the  true 

S, 


where  R  =  x’(xx' )~1X.  We  observe  that  R  is  a  (K+1)x(K+1) 
symmetric  idempotent  matrix  of  rank  N  and  is  the  projection 
matrix  operator  onto  the  space  spanned  by  the  rows  of  X.  In 
addition,  since  R  is  a  projection  matrix,  we  note  that  0<r^£1, 
where  r^  designates  the  ith  diagonal  element  of  R^.  .Furthermore, 
the  sum  of  all  diagonal  elements  of  R  equals  N,  since  the  trace 
of  R  equals  its  rank. 


The  notion  of  R-optimal  designs  was  introduced  by  Mitchell, 

Hunter,  and  Showers  (  1980),  who  considered  a  Bayesian  '»nalysis  of 

2 

the  supersaturated  screening  problem.  Assuming  that  <j  s0  and  no 
constant  term  SQ  is  present  and  making  certain  prior  assumptions 
regarding  8 1 ,82,  . • •  ,3^,  8^  was  obtained  as  the  Bayes  estimate  of 
£.  It  may  be  noted  that  the  coefficient  matrix  X,  in  this  case, 
does  not  include  a  column  of  +1's  corresponding  to  the  constant 
term  8Q. 


The  variance-covariance  matrix  of  the  posterior  distribution 
of  0_,  as  derived  by  Mitchell,  Hunter,  and  Showers,  was  found  to 
be  proportional  to.  the  matrix  I-R .  Consideration  of  this  result 
lead  to  the  following  design  criteria.  A  design  is  said  to  be 
R-optimal  in  a  given  class  of  designs  if  it  minimizes,  over  all 
designs  in  that  class,  the  maximum  diagonal  element  of  R_.  This 
amounts  to  making  the  diagonal  elements  of  R^,  hence  also  I-R,  as 
nearly  equal  as  possible  (since  the  trace  of  R  is  fixed).  Thus, 
an  R-optimal  design  would,  in  some  sense,  provide  equal 
information  on  all  the  Bj’s.  It  follows  that  if  a  design  exists 
such  that  all  r^  are  equal,  that  design  is  R-optimal. 

In  a  related  Bayesian  treatment  of  this  problem,  Anscombe 
(  1963)  imposed  certain  restrictions  on  the  coefficient  matrix  X 
in  order  to  expedite  calculation  of  the  posterior  distribution. 
One  of  these  was  that  the  rows  of  X  are  orthogonal.  It  i3  easy 
to  show  that  If  X  is  a  row-orthogonal  matrix,  the  diagonal 
elements  of  Ft  are  all  equal,  and  consequently  the  associated 
design  is  R-optimal.  This  indicates  that,  for  certain  values  of 
N  and  X,  a  supersaturated  R-optimal  design  can  be  readily 
obtained  by  transposing  a  (K+1)xN  (N  £  K)  column-orthogonal 
matrix  (where  a  row  of  +1's  is  reserved  a3  coefficients  of  the 
constant  term  Bq). 

The  principal  advantage  to  R-optimal  designs  is  that,  under 
certain  restrictions,  estimation  of  ji  is  isotropic,  i.e.,  the 
posterior  variances  of  the  are  all  equal.  The  principal 
disadvantage  of  such  designs  is  that  their  general  performance 
characteristics  in  factor  screening  experiments  have  not  been 
fully  evaluated. 


3.7  Search  Designs 

The  factor  screening  problem  can  also  be  formulated  as  a 
"search"  problem,  following  Srivastava  (1975).  Suppose  it  is 
known  that  at  most  k  of  the  K  factor  effects  are  non-zero  and  the 
remaining  (K-k)  effects  are  zero.  The  goal  of  the  search  problem 
is  to  determine  the  k  non-zero  effects  and  estimate  them. 

Srivastava  developed  design  criteria  for  obtaining  search 
designs  in  the  case  where  no  experimental  error  is  present  (i.e., 
°2=0).  We  can  write  the  coefficient  matrix  X  as  X=H  D),  where  1_ 
is  an  Nxl  vector  of  t1*s  and  D  is  an  NxK  design  matrix.  In  order 
to  ensure  that  the  k  non-zero  effects  can  be  uniquely  determined 
from  the  K!/k!(K-k)!  possible  subsets,  every  subset  of  2k  columns 
of  D  together  with  the  column  vector  must  have  a  combined  rank 
of  2k+1.  This  implies,  of  course,  that  N  2k+1« 

Smaller  sized  search  designs  are  possible,  however,  under 
the  further  restriction  that  the  response  vector  does  not  lie 
in  the  intersection  of  two  competing  3ubspaces.  An  equivalent 
restriction  is  that  ho  linear  relationships  exist  among  the  ^’s. 
In  this  ease,  X  may  be  chosen  so  that  no  two  (k+1 )-dimensional 
column  sub3paces  are  identical,  where  each  subspace  includes  the 
column  vector  2*  This  implies,  of  course,  that  N  >  k+1. 

The  principal  advantage  of  search  designs  is  that 
theoretically  the  non-negligible  effects  should  be  identified  and 
estimated  with  reasonable  power,  since  this  is  an  inherent 
condition  of  the  construction  of  such  designs. 

There  are  three  major  disadvantages  of  search  designs.  The 
first  of  these  is  that  construction  of  two-level  (+1)  search 
designs  is  extremely  difficult,  particularly  for  large-scale 
simulation  studies.  A  second  disadvantage  is  that  it  is,  assumed 
that  the  maximum  number  of  non-negligible  effects,  k,  is  known. 

It  is  unclear,  however,  what  impact  raisspecif icatlon  of  k  will 
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have  on  the  search  procedure.  Finally,  the  analysis  of  search 
designs,  which  is  based  on  subset  regressions,  may  require  a 
prohibitive  number  of  computations  even  for  moderate  values  of  N, 
K,  and  k. 

4.  SUMMARY  DISCUSSION 

In  this  paper  we  have  presented  a  description  and 
comparative  discussion  of  eight  different  types  of  supersaturated 
designs  which  have  been  suggested  for  use  in  factor  screening 
experiments,  with  application  to  the  study  of  large-scale 
computer  simulation  models.  Because  of  the  lack  of  comparative 
performance  data,  there  are  currently  no  definitive  guidelines 
for  the  selection  and  use  of  supersaturated  screening  methods. 
Nevertheless,  of  those  methods  surveyed,  the  group  screening  . 
method  has  been  generally  recommended,  and  we  would  concur  with 
this  recommendation  except  when  the  number  of  runs  relative  to 
the  number  of  factors  is  severely  limited.  .  In  such  a  case,  Mauro 
and  Burns  (1984)  found  that  the  performance  of  group  screening 
can  be  extremely  poor,  even  for  detecting  the  large  effects.  In 
such  situations,  then,  alternative  design  strategies,  such  as 
systematic  supersaturated  designs,  should  be  considered.  From  a 
practical  point  of  view,  although  the  screening  plans  considered 
in  this  paper  are  appealing,  further  theoretical  development  of 
these  and  other  methods  is  needed,  particularly  in  relation  to 
the  study  of  computer  simulations  per  se. 
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